Perceptually weighted linear transformations for voice conversion
نویسندگان
چکیده
Voice conversion is a technique for modifying a source speaker’s speech to sound as if it was spoken by a target speaker. A popular approach to voice conversion is to apply a linear transformation to the spectral envelope. However, conventional parameter estimation based on least square error optimization does not necessarily lead to the best perceptual result. In this paper, a perceptually weighted linear transformation is presented which is based on the minimization of the perceptual spectral distance between the voices of the source and target speakers. The paper describes the new conversion algorithm and presents a preliminary evaluation of the performance of the method based on objective and subjective tests.
منابع مشابه
A Study on Bag of Gaussian Model with Application to Voice Conversion
The GMM based mapping techniques proved to be an efficient method to find nonlinear regression function between two spaces, and found success in voice conversion. In these methods, a linear transformation is estimated for each Guassian component, and the final conversion function is a weighted summation of all linear transformations. These linear transformations fit well for the samples near to...
متن کاملA comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus...
متن کاملThe linear transformation of LF glottal waveforms for voice conversion
Most Voice Conversion (VC) systems exploit source-filter decomposition based on linear prediction (LP) to transform spectral envelopes, incurring as a result various issues related to the oversimplification of the LP voice source model. Whilst residual prediction methods can mitigate this problem, they cannot be used to modify voice source quality. In this paper, a system which employs linear t...
متن کاملSpectral voice conversion based on unsupervised clustering of acoustic space
Voice conversion systems aim at modifying a source speaker’s speech so that it is perceived as if a target speaker had spoken it. Applying voice conversion techniques to a concatenative text-to-speech synthesizer allows for the personification of such systems, so that additional voices from a single source-speaker database can be produced quickly and automatically. This paper presents a new alg...
متن کاملAnalysis of speaker clustering strategies for HMM-based speech synthesis
This paper describes a method for speaker clustering, with the application of building average voice models for speakeradaptive HMM-based speech synthesis that are a good basis for adapting to specific target speakers. Our main hypothesis is that using perceptually similar speakers to build the average voice model will be better than use unselected speakers, even if the amount of data available...
متن کامل